Grade of Membership Model and Visualization for RNA-seq data using CountClust

نویسندگان

  • Kushal K Dey
  • Chiaowen Joyce Hsiao
  • Matthew Stephens
چکیده

Grade of membership or GoM models (also known as admixture models or Latent Dirichlet Allocation”) are a generalization of cluster models that allow each sample to have membership in multiple clusters. It is widely used to model ancestry of individuals in population genetics based on SNP/ microsatellite data and also in natural language processing for modeling documents [1, 3]. This R package implements tools to visualize the clusters obtained from fitting topic models using a Structure plot [2] and extract the top features/genes that distinguish the clusters. In presence of known technical or batch effects, the package also allows for correction of these confounding effects. CountClust version: 1.2.0 1 This document used the vignette from Bioconductor package DESeq2, cellTree as knitr template

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering RNA-seq expression data using grade of membership models

Grade of membership models, also known as “admixture models”, “topic models” or “Latent Dirichlet Allocation”, are a generalization of cluster models that allow each sample to have membership in multiple clusters. These models are widely used in population genetics to model admixed individuals who have ancestry from multiple “populations”, and in natural language processing to model documents h...

متن کامل

Visualizing the structure of RNA-seq expression data using grade of membership models

Grade of membership models, also known as "admixture models", "topic models" or "Latent Dirichlet Allocation", are a generalization of cluster models that allow each sample to have membership in multiple clusters. These models are widely used in population genetics to model admixed individuals who have ancestry from multiple "populations", and in natural language processing to model documents h...

متن کامل

Investigating the Function of Predicted Proteins from RNA-Seq Data in Holstein and Cholistani Cattle Breeds

This study was performed to determine the digital expression profile of different genes expressed in Holstein and Cholistani breeds as well as to evaluate the performance of predicted proteins derived from differentially expressed genes between these two breeds using RNA-Seq data. For this purpose, the whole mRNA sequence for a blood sample of American Holstein and Pakistani Cholistani cattle p...

متن کامل

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016